Zomato Exploratory Data Analysis

zomato-fact-sheet-660-052417055850-111517063712.jpg

Step 1: Importing dataset and necessary libraries

The Dataset is from https://wwww.kaggle.com The file size is about 547MB

Step 2: Data preparation & cleaning

Load the dataset into a data frame using Pandas Explore the number of rows & columns, ranges of values etc. Handle missing, incorrect and invalid data Perform any additional steps (parsing dates, creating additional columns, merging multiple dataset etc.)

Step 3: Statistical analysis & visualization

Compute the mean, sum, range and other interesting statistics for numeric columns Explore relationship between columns using scatter plots, bar charts etc. Make a note of interesting insights from the exploratory analysis Your notebook should contain at least 8 graphs & 4 different types of graphs

Step 4: Ask & answer questions about the data

Ask at least 8 interesting questions about your dataset Answer the questions either by computing the results using Numpy/Pandas or by plotting graphs using Matplotlib/Seaborn/Plotly/Folium Create new columns, merge multiple dataset and perform grouping/aggregation wherever necessary For each question, summarize the key insight from the analysis or visualization in simple words

Step 5: Summary and conclusion

Write a summary of what you've learned from the analysis Include interesting insights and graphs from previous sections Share ideas for future work on the same topic using other relevant datasets Share links to resources you found useful during your analysis

Step 1: Importing dataset and necessary libraries

Step 2: Data preparation & cleaning

Loading the csv file into the pandas dataframe

Data cleaning

The URL column, address, Menu Column, Phone Number are not requrired. So I am going to drop those columns. Also in the other coulmuns, I will replace the null values with appropriate values. The rating is done out of 5 So I am going to keep only the rating and take out the out of 5 value. from the column.

Droping the unnecessary columns¶

=> Rate Column

=> Votes Column

Votes contain purely integers, so no need of cleaning it.

=> Location Column

There are two columns by the name location. One is 'location'other is 'listed_in(city)' so I am going to drop the one that has got least unique values

I am going to replace the least called locations under others category

=> Cuisines Column

=> approx_cost(for two people) Column

I am going to remove the comas and make the values an integer

=> Restaurant Type Column

=> Listed In type column

There are not many types in this category. So no need to clean the data

=> Dishes Liked

Here The Outlier is the Null Value. 28078 of the values are null. Rest have liked what of dish they have liked. So I am going to make the null values as not participants

Now I am going to cluster the least liked item into others. That is if people have rated the dish less than 50

Step 3: Statistical analysis & visualization

After Data Cleaning, The Statistical Values like mean, standard deviation, quartiles can be found from the function describe()

Data Visualization

Visulazing Location

I am going to check the number of number times the location that has repeated itself.

Fig 1 is showing the number of restaturants area wise in Bengaluru.

Visualizing online order facility

Fig 2. Shows the Online Order facility of restaurants as to How many restaurants provide online ordering facility. About 30000+ restaurants in Bengaluru provide online ording facility. About 20000+ restaurants do not provide online ordering facility.

Visualizing table Booking Facility

Fig 3. Shows the reservation System. Well about 80% restaurants do not provide reseveration. 20% of them do provide reservation facility.

Visualizing Restuarant Type

Fig 4. shows the percentage of type of the restaurant. About 37% of the restaurants are casual dining centers, 20% are quick bites, 7.2% are cafes, 4.4% are Dessert Parlors.

Visualizing dishes liked

Visualizing the dish_liked Column

Fig 5. Shows the Type of the dish That is been liked by the customers. The rest of the items liked are Chiken Biryani, Massala Dosa, Paratha, waffles repsectively in the decreasing percentages. This data is insufficient to say which the best dish that people have liked as the percentage of the people who have not rated the dish item is more and there are small dish items in the list as well.

Visualizing cuisines

Fig 6. shows the cuisines prepared in the restaunts. Restaurants preparing North Indian dish are popular. Next to that, we have North Indian Chinese, South Indian, Biryani spots, Bakery, Desserts are respectively popular in the decreasing order.

Visualizing the category of restaurant (listed_in (type) column)

Fig 7: Shows the categories of restaurant. 50.2%(25942) restaurants are Buffet's. 34.4%(17779) are cafes. Fast food joints are off 6.95% (3593). Bakery and sweet parlours which come under Desserts are 3.33%(1723). Dine-outs are 2.13%(1101). Drinks and Nightlife restaurants are 1.71%(882). Bar and Pubs constitute 1.35%(697)

Visualizing rating out of 5

Fig 8: shows the distribution of rating versus the number of ratings. The Average rating about 4.0

visualizing approximate cost for 2 people

Fig 9: shows the distribution of cost for two people dining versus the number of restaurants having the same cost.

Step 4: Analysis of data for the following questions

I want to use this data to solve the following questions

  1. How many People have rated restaurants?
  2. How is the online ordering facility affect the rating?
  3. How does online booking facility affect the rating?
  4. What restaurants in what all locations do they provide online ordering facility?
  5. What restaurants in what all locations do they provide online booking facility?
  6. How much does it cost for 2 people dining based on different location?
  7. What type of restaurant should be opened based on different location?
  8. What type of cuisines should be prepared for best business?

1. How many People have rated restaurants?

Fig 10: shows the distribution of ratings with respect to the number of people who have voted. Regardless of the restaurant, 5000+ people have voted 4.7 rating. The next rating is 4.9 voted by 4000+ people. There are bad ratings too.

2. How is the online ordering facility affect the rating?

Fig 11: shows the online ordering facility versus rating. The average rating that is given for a restuarant which has online ordering facility is more than the restaurants which do not provide online ordering facility. Surprisingly the restaurants which do not provide online service is more than the ones that provide online servicing.

3. How does online booking facility affect the rating?

Fig 12. shows the online ordering table booking versus rating. The average rating that is given for a restuarant which has online table booking facility is more than the restaurants which do not provide online table booking facility.

4. What restaurants in what all locations do they provide online ordering facility?

Since there are many Location values, I am going to group them first and create a pivot table, later I will be ploting a bar chart

Fig 10. shows the distribution of online ordering restaurants location wise in Bengaluru. Excluding the restaurants which do allow online ordering and not ordering facility in minor areas are showing the highest counts. BTM Layout has more restaurants which are providing online ordering service. In Every location, the restaurants which provide online ordering are more than the offlines ones.

5. What restaurants in what all locations do they provide online booking facility?

Fig 11. shows the table booking facility based on location. About 5000 restaurants in BTM Layout do not prvide online table booking facility. This is true even with the other restaurants in the other locations, though the number of restaurants vary

6. How much does it cost for 2 people dining based on different location?

Fig 12: shows the distribution of approximate cost for two people people dining in all the restaurants in the respective area. Lavelle Road restaurants are much costlier than any other areas, also Recidency road, Richmond Road, Ulsoor, church street, Indiranagr are costlier. Other area are comparetively higher.

7. What type of restaurant should be opened based on different location?

Fig 13: shows the type of the restaurant and their categories. The Delivery Type of restaraunts are more in number than a buffet, pubs and bars.

8. What type of cuisines should be prepared for best business?

Fig 14: Shows the variation different cuisines

Step5: Summary and Conclusion

Summary

The above analysis can be summarized as follows

Conclusion

The dataset collected from Zomato can be classified as an NLP problem. The text in the dataset can be further used for sentiment analysis, recommendation system. Based on the above analysis, Biryani, North Indian, South Indian cuisines are most famous BTM Layout is one of the hotspot for dining. Online ordering facility has helped the restaurants to get a higher rating than those do not provide online servicing. As well as restaurants giving Reservation facility via online and offiline have gained more average rating than those do not provide the reservation facility. Fast Food chains and Delivery chains are most popular and have gained higher rating as well.

All in all a if at all one is going to open a restaurant in Bengaluru, online ordering, reservation facilities are important to have a higher rating and to maintain good customers, busy area like, BTM Layout, Whitefield, Indiranagar are going to be pretty hard to servive as the competition is more.

References

www.kaggle.com